GBI: A Generalized R-Tree Bulk-Insertion Strategy
نویسندگان
چکیده
A lot of recent work has studied strategies related to bulk loading of large data sets into multidimensional index structures In this paper we address the problem of bulk insertions into existing index struc tures with particular focus on R trees which are an important class of index structures used widely in commercial database systems We pro pose a new technique which as opposed to the current technique of inserting data one by one bulk inserts entire new incoming datasets into an active R tree This technique called GBI for Generalized Bulk In sertion partitions the new datasets into sets of clusters and outliers constructs an R tree small tree from each cluster identi es and pre pares suitable locations in the original R tree large tree for insertion and lastly performs the insertions of the small trees and the outliers into the large tree in bulk Our experimental studies demonstrate that GBI does especially well over better than the existing technique for randomly located data as well as for real datasets that contain few natu ral clusters while also consistently outperforming the alternate technique in all other circumstances Index Terms Bulk insertion Bulk loading Clustering R Tree Index Struc tures Query Performance This work was supported in part by the University of Michigan ITS Research Cen ter of Excellence grant DTFH X Sub sponsored by the U S Dept of Transportation and by the Michigan Dept of Transportation Dr Rundensteiner thanks IBM for the Corporate IBM partnership award and Li Chen thanks IBM for the Corporate IBM fellowship as well as mentoring from the IBM Toronto Labora
منابع مشابه
Improving Performance with Bulk-Inserts in Oracle R-Trees
Spatial indexes play a major role in fast access to spatial and location data. Most commercial applications insert new data in bulk: in batches or arrays. In this paper, we propose a novel bulk insertion technique for R-Trees that is fast and does not compromise on the quality of the resulting index. We present our experiences with incorporating the proposed bulk insertion strategies into Oracl...
متن کاملBulk insertion for R-trees by seeded clustering
We propose a scalable technique called Seeded Clustering that allows us to maintain R-tree indices by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an Rtree for each of the clusters and insert the input R-trees into the target R-t...
متن کاملBulk Insertion for R-Tree by Seeded Clustering
In many scienti c and commercial applications such as Earth Observation System (EOSDIS) and mobile phone services tracking a large number of clients, it is a daunting task to archive and index ever increasing volume of complex data that are continuously added to databases. To eÆciently manage multidimensional data in scienti c and data warehousing environments, R-tree based index structures hav...
متن کاملTwo-Phased Bulk Insertion by Seeded Clustering for R-Trees
With great advances in the mobile technology and wireless communications, users expect to be online anytime anywhere. However, due to the high cost of being online, applications are still implemented as partially connected to the server. In many data-intensive mobile client/server frameworks, it is a daunting task to archive and index such a mass volume of complex data that are continuously add...
متن کاملBulk Insertions into R-Trees
A lot of recent work has focussed on bulk loading of data into multidimensional index structures in order to eeciently construct such structures for large datasets. Previous work on bulk loading data focussed at building index structures from scratch, while the problem of bulk insertions into existing index structures has been largely overlooked. In this paper, we address this new problem with ...
متن کامل